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STABILITY AND UNIQUENESS OF p -VALUES FOR LIKELIHOOD-BASED INFERENCE 


THOMAS J. DICICCIO, TODD A. KUFFNER, G. ALASTAIR YOUNG, AND RUSSELL ZARETZKI 


Abstract. Likelihood-based methods of statistical inference provide a useful general methodology that is 
appealing, as a straightforward asymptotic theory can be applied for their implementation. It is important to 
assess the relationships between different likelihood-based inferential procedures in terms of accuracy and 
adherence to key principles of statistical inference, in particular those relating to conditioning on relevant 
ancillary statistics. An analysis is given of the stability properties of a general class of likelihood-based 
statistics, including those derived from forms of adjusted profile likelihood, and comparisons are made 
between inferences derived from different statistics. In particular, we derive a set of sufficient conditions 
for agreement to O p (n -1 ), in terms of the sample size n, of inferences, specifically p- values, derived from 
different asymptotically standard normal pivots. Our analysis includes inference problems concerning a 
scalar or vector interest parameter, in the presence of a nuisance parameter. 


1. Introduction 

A highly useful statistical methodology for inference on a scalar or vector interest parameter in the 
presence of a nuisance parameter is furnished by procedures based on the likelihood function, including 
tests and confidence sets based on the likelihood ratio statistic. Though no explicit optimality criteria 
are invoked, a quite general asymptotic theory allows straightforward implementation of such methodol¬ 
ogy in a wide range of settings. However, accuracy and what may be termed inferential correctness are 
(Young (2009)) key desiderata of any parametric inference. When constructing, say, a confidence set for 
a parameter of interest in the presence of nuisance parameters, we desire high levels of coverage accu¬ 
racy from the confidence set. Further, it is important that procedures are inferentially correct, meaning 
that they respect key principles of inference, in particular those relating to appropriate conditioning on 
ancillary information when this is relevant. The crucial issue here is the stability of the statistic used for 
inference, the extent to which the unconditional distribution of the statistic agrees with the conditional 
distribution of the statistic, relevant for achieving inferential correctness. Henceforth, when speaking of 
the stability of a pivot, we mean whether or not its marginal distribution inherently respects ancillary 
information. Specifically, a statistic which is stable to second-order is one whose conditional distribution 
given the observed value of an ancillary statistic agrees to second-order, ()(rr l ), in the sample size n 
with its marginal distribution. Our objective in this paper is to both analyse and elucidate properties of 
likelihood-based methods of statistical inference against these desiderata, and to provide new results that 
shed light on what is achieved by alternative approaches to implementation of likelihood-based methods 
of inference. We make two novel contributions. 


Key words and phrases, adjusted profile likelihood; ancillary statistic; likelihood; modified signed root likelihood ratio 
statistic; nuisance parameter; pivot; stability. 
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We provide a general assessment of the stability properties of likelihood-based statistics commonly 
used for parametric inference. Our analysis considers first the case of the signed root likelihood ratio 
statistic for inference on a scalar interest parameter, in the presence of a nuisance parameter. In doing 
so, we establish a generalization to the practically realistic context involving nuisance parameters of re¬ 
sults described by McCullagh (1984) and Severini (1990). We then discuss this issue for asymptotically 
standard normal pivots more generally, in particular those constructed from adjusted forms of profile 
likelihood, before considering inference for vector interest parameters. The results presented here al¬ 
low comparisons to be drawn between the inferential properties of parametric bootstrap procedures and 
techniques of higher-order inference based on asymptotic, analytic approximation. 

We also provide an explicit comparison of inferences, specifically p-values, obtained from differ¬ 
ent asymptotically standard normal pivots, including those constructed from adjusted forms of profile 
likelihood, establishing certain higher-order equivalences and differences. We derive a set of sufficient 
conditions ensuring agreement of p -values derived from different asymptotically standard normal pivots, 
to order () p (rr 1 ). 


2. Background 

Suppose that Y = (>j,..., Y n ) is a continuous random vector and that the distribution of Y depends 
on an unknown d-dimensional parameter 9, partitioned as 0 = (0, 0), where initially we suppose 0 = 9\ 
is a scalar interest parameter and 0 is a nuisance parameter of dimension d — 1. We later consider the 
case of a vector interest parameter 0. 

Let L(0) be the loglikelihood function for 6 based on Y and let 0 = (0, 0) be the global maximum 
likelihood estimator of 9. Further, let 9 = 0(0) = (0,0) = (0,0(0)} be the constrained maximum 
likelihood estimator of 9 for given 0. Then the profile loglikelihood function for 0 is M(0) = L{9(vj) } 
and the likelihood ratio statistic for 0 is IL(0) = 2{M(0) — M(0)}, where M(0) = L{9), since 0(0) = 
9. The signed root likelihood ratio statistic is f?(0) = sgn(0 — 0){kF(0)} 1 / 2 . Testing H 0 : 0 = 0 O 
against H a : 0 > 0 O or H„ : 0 < 0 O can be based on the test statistic f?(0 0 ). Asymptotically, as 
the sample size n increases, the sampling distribution of H(Y) tends to the standard normal distribution. 
Heading the list of desiderata for refinement of the inference procedures furnished by such first-order 
asymptotic theory is the achievement of higher-order accuracy in distributional approximation, while 
respecting the need for inferential correctness. 

Two main routes (Young (2009)) to higher-order accuracy emerge from contemporary statistical the¬ 
ory. The most developed route is that which utilises analytic procedures, based on ‘small-sample asymp¬ 
totics’, such as saddlepoint approximation and related methods, to refine first-order distribution theory. 
The second route involves simulation or bootstrap methods, which aim to obtain refined distributional 
approximations directly, without analytic approximation: see, for instance, DiCiccio, Martin and Stern 
(2001), Lee and Young (2005), DiCiccio and Young (2008). 

A detailed account of analytic methods for distributional approximation which yield higher-order ac¬ 
curacy is given by Barndorff-Nielsen and Cox (1994). Two particular highlights of an intricate theory 
are especially important: Bartlett correction of the likelihood ratio statistic fL(0), which we discuss in 
Section 8, and the construction of analytically modified forms of the signed root likelihood ratio statistic 
f?(0), designed to offer higher-order accuracy. These procedures also provide inferential correctness, 
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specifically conditional validity, to high (asymptotic) order, in the two key settings where conditional 
inference is crucial, namely multi-parameter exponential family and ancillary statistic contexts. Partic¬ 
ularly central to the analytic approach to higher-order accurate inference on a scalar interest parameter 
is Barndorff-Nielsen’s IT* statistic (Barndorff-Nielsen (1986)). In both the multi-parameter exponential 
family and ancillary statistic contexts, the R* statistic is conditionally, and hence unconditionally, dis¬ 
tributed as standard normal, to error of third-order (){nr 3/2 ) in the sample size. So, analytic standard 
normal approximation of the sampling distribution of the IT* statistic yields third-order accuracy under 
repeated sampling, while respecting the requirements of conditioning to that same order. 

Lawley (1956) showed that E g {R(ip)} = n~ 1 ^ 2 m(9) + 0(n~ 3 / 2 ) and var e {f?(6 1 )} = 1 + n~ 1 v{9 ) + 
0(n~ 2 ), where rn(f)) and v(9) are both of order 0(1), while the third and higher-order cumulants are of 
order 0(?rr 3 / 2 ) or smaller; see also Bickel and Ghosh (1990). Therefore, {R(ip) — n~ 1 ^ 2 m(9)}/{ 1 + 
n~ 1 v(9 )} 1 / 2 has the standard normal distribution to error of order 0(n 3/ 2 ). DiCiccio and Stem (1994a) 
showed that { R(ip) — n~ 1 ^ 2 m(9)}/{l + n~ 1 v{9)} 1 ^ 2 also has the standard normal distribution to error of 
order 0(n _3//2 ). This DiCiccio and Stem (1994a) result asserts that [R{ip) — EQ{R(ip)}]/\ya,ig{R{ip)}] 1 / 2 
is also distributed as standard normal to error of order ()(n 3/2 ). In turn, this distributional result im¬ 
mediately suggests the parametric bootstrap approaches to third-order accurate inference discussed by 
DiCiccio et al. (2001) and Lee and Young (2005). For testing H 0 : ip = ip 0 against one-sided alterna¬ 
tives, p —values distributed, under repeated sampling, as uniform to error of order 0(n~ 3 / 2 ), and hence 
yielding error rate 0 (tt, - 3 / 2 ), can be obtained by bootstrapping R(ip 0 ) at the parameter value 9 = (ip 0 ,(p 0 ), 
where <p 0 = 4>(ipo)- DiCiccio and Young (2008) show that this parametric bootstrap procedure respects 
the requirements of conditioning in multi-parameter exponential family settings to third-order. 

From a repeated sampling perspective, such third-order accurate inference can be similarly obtained 
(Lee and Young, 2005) by bootstrap approximation to the sampling distribution of other asymptotically 
standard normal pivots, in particular, pivots constructed as standardized versions of the difference ip — 
ip 0 or the score function dM(ip) / dip |</,=</, 0 , that avoid calculation of both the global and constrained 
maximum likelihood estimators, and may therefore may be more appealing for use in a computationally- 
intensive bootstrap inference. A fundamental question that arises concerns the inferential implications of 
choice of a particular statistic: when do inferences based on different choices of statistic agree to high- 
order? It is also necessary to ask whether such inference respects the requirements of conditioning on 
relevant ancillary statistics, in models which admit the existence of such. Since a bootstrap calculation 
involves unconditional sampling at parameter value 9 = ( ip 0 , </> 0 ), the key question is the extent to which 
the conditional and unconditional distributions of the statistic being used for the inference differ. 

In this paper we provide an analysis directed at these questions, providing new results on the stability 
properties of likelihood-based statistics and agreement of p -values derived from different asymptotically 
normal pivots. The implications of the analysis for bootstrap methodology and detailed comparisons of 
the latter with analytic procedures of inference will be described elsewhere. 

We consider first the stability properties of the signed root statistic R(ip)', in doing so, we establish 
a generalization to the nuisance parameter context of a result of McCullagh (1984): see also Severini 
(2000, Section 6.4.4). We then discuss the stability issue in problems involving nuisance parameters for 
asymptotically standard normal pivots more generally, before examining conditions which ensure that 
/Lvalues derived from two different pivots agree to second-order. Extension of the conclusions to test 
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statistics based on general adjusted forms of profile likelihood are described, before presenting results 
concerning inference for vector interest parameters. 

Our analysis is concerned exclusively with inferential comparisons ‘under the null’ so, for instance we 
examine the unconditional and conditional distributions of the signed root statistic R('ip) under the model 
in question when the true parameter value is 9 = (t/j, (j)). Similarly, the analysis concerns comparison of 
different p -values under assumed correctness of the null hypothesis being tested. 

3. Notation 

In the calculations that follow, arrays and summation are denoted by using the standard conventions, 
for which the indices r, s, t,... are assumed to range over 1 ,,d. Summation over the range is implied 
for any index appearing in an expression both as a subscript and as a superscript. Differentiation is 
indicated by subscripts, so L r (9) = 8L(9)/89 r , L rs (9 ) = d 2 L(9) / 89 r 89 s , etc. Then E{L r (9)} = 
0, let A rs E{L rs (9)y, A rst E{Erst(9)\, etc., and put l r L r (9), l rs L rs (Q ) A rs , lrst 

L rst (9 ) — X rst , etc. The constants X rs , X rst ,..., are assumed to be of order 0{n). The variables l r , l rs , 
l rst , etc., each of which have expectation 0, are assumed to be of order O p (n l l 2 ). The joint cumulants 
of l r , l rs , etc. are assumed to be of order 0(n). These assumptions are usually satisfied in situations 
involving independent observations. The observed information matrix is J(9) = | -L rs (9)\, while the 
expected (Fisher) information matrix is 1(9) = [—A rs (0)]. It is useful to extend the A-notation: let 
X V)S = E(L r L s ) = E(l r l s ), X rStt = E(L rs L t ) = E(l rs l t ), etc. The Bartlett identities involving the A’s 
can be derived by repeated differentiation of the identity J exp {L(9)}dy = 1; in particular, 

A rs T K,s 0? A rs t T X rSjt + A rt,s + Kt,r + K,s,t — 0. 

Differentiation of the definition A rs = f L rs (9 ) exp {L(9)}dy yields X rs / t = X rst + X rs>t , where X rs / t = 
8X rs /89 t . Further, let (A rs ) be the d x d matrix inverse of (A rs ), and let r] = —1/A 11 , r rs = r]X lr X ls , 
and u rs = X rs + T rs . Thus, X rs , r rs , and u rs are of order 0(n -1 ), while rj is of order 0(n). For clarity, 
we point out that a superscript or subscript of T’ refers to the scalar interest parameter ip, where ip is the 
first component of 9. 

Suppose that A is an ancillary, i.e., distribution constant, statistic such that (9, A) is sufficient. To 
distinguish conditional calculations from unconditional ones, the accent symbol ° is used to denote quan¬ 
tities derived from the conditional distribution of Y given A. Since the conditional loglikelihood L(9) 
differs from the unconditional loglikelihood L(9) by a quantity that depends on A but not on 9, it follows 
that FF(-0) = W(^) and that L r = L r , L rs = L rs , etc. Let A rs = E{L rs (9)}, X rst = E{L rst (9)}, 
etc., and put l r = l r (9), l rs = L rs (9 ) — X rs , l rst = L rst (9 ) — X rst , etc. The quantities X rs , X rst , etc. 
are random variables depending on A, assumed to be of order O p (n). The variables l r , l rs , lrst, etc. 
have conditional expectation 0, so they also have unconditional expectation 0, and they are assumed to 
be of order () p (n l /2 ). Further, the joint conditional cumulants of l r , l rs , etc. depend on A, and they are 
assumed to be of order O p (n). It is useful to extend the A-notation by letting X r s = E(L r L s ) = E(l r l s ), 
X rSt t = E(L rs L t ) = E(l rs l t ), etc. Also, let (A rs ) be the d x d matrix inverse of (A rs ), and let rj = —1/A 11 , 
T rs = f)X lr X ls , and u rs = X rs + T rs , so that X rs , f rs , and u rs are of order O p (n _1 ), while fj is of order 
Op(n). 
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Following Bamdorff-Nielsen and Cox (1994, Section 7.2), construction of an ancillary statistic A such 
that (0, A) is sufficient is, except in rather special cases, only possible for transformation models and, in a 
degenerate sense, for full exponential family models, where 9 itself is sufficient. It is therefore in general 
necessary to consider conditioning on statistics A which are approximately ancillary in a suitable sense. 
Results presented here continue to hold under the assumption that A is locally ancillary (Cox (1980)). Let 
8 0 be an arbitrary but specified parameter value, and let A = A(Y, 0 O ) be a candidate ancillary statistic. 
If the density of A under parameter value 9 0 + n~ l ^ 2 5 satisfies 

f A (a; 6 0 + n~ l,2 5) = f A (a\ 0 O ){1 + 0(n~ q/2 )}, 

then (Cox (1980), McCullagh (1987, Section 8.3)) A is said to be g-th order local ancillary in the vicinity 
of 0 O - Note that this definition applies only to parameter values in an ()(n "A 2 ) neighbourhood of 9 0 : 
if 9 0 is the true parameter value, as n increases the likelihood function becomes negligible outside this 
neighbourhood. The loglikelihood function based on A satisfies L A (9 0 + n~ 1//2 <5) = L A (8 0 ) + 0(n~ q / 2 ). 
As is the case in the no nuisance parameter context considered by Severini (1990) and McCullagh (1987, 
Section 8.4), results in Section 4 relating to stability of asymptotically standard normal pivots continue to 
hold for any second-order local ancillary A, as do results in Section 8 concerning stability of an adjusted 
profile likelihood ratio statistic. Essentially, the assumption of a second-order local ancillary is sufficient 
to ensure the relationships detailed below between conditional and unconditional cumulants. 

The technique of proof used here to compare the conditional and unconditional distributions of asymp¬ 
totically standard normal pivots to second order is a generalization of that described by Severini (2000, 
Chapter 6) in the case of a scalar interest parameter without nuisance parameters. For this technique, it 
is essential to compare the A-quantities with their A-counterparts. 

We first investigate the difference between X rs and X rs ; note that X rs = E(L rs ) = E{E(L rs )} = 
E(X rs ). Furtheimore, var(A rs ) = var {E(L rs )} = var (L rs ) — E{vhv(L rs )} = 0(n) — E{O p (n)} = 
0(n), and consequently, X rs = X rs + O p (n 1//2 ). An identical argument shows that X rst = X rst + O p (n 1</2 ), 
etc. 

Assume that differentiation of the identity X rs = X rs + O p (n 1/2 ) yields X rs /t = X rs / t + O p (n 1/2 ), 
where X rs / t = dX rs /d9 t and, as before, X rs / t = dXm/dO 1 . We note that, as a rule, differentiation 
of an asymptotic relation will preserve the asymptotic order, but that care is necessary; see Bamdorff- 
Nielsen and Cox (1994, Exercise 5.4) and Pace and Salvan (1994). The asymptotic order of the difference 
between X rs / t and X rs / t indicated here, therefore, actually constitutes an additional assumption of our 
calculations. The preceding results imply X rS}t = X rSjt + O p (n 1 / 2 ), since the Bartlett identities X rs / t = 

A rst T A rs,t RHd A r s/t A r st T A rs,t yield X rs j Xrs/t ^rst A rs/t Kst + O p (n 1/2 ) = X rs,t + O p (n 1/2 ). 

Define A rs = A rs — X rs , so that A rs is a function of 9 and A, having order O p (n 1//2 ). Then l rs = 

Lrs A rs (L rs T A r s ) ^rs ^ rs• 

4. Stability result for R(^) and other pivots 
W e now consider the stability of R(E) and other asymptotically standard normal pivots. 

4.1. RNi) is a stable pivot to second order. 
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Theorem 1. The conditional and unconditional distributions of R(v') agree to error of order 0{n 1 ), 
given the ancillary statistic A. 


Proof To error of order O (n ~ l ), the variance of R(f>) is 1 and the third- and higher-order cumulants are 
0; the mean is of order ()(rr l i 2 ). The conditional distribution given A has the same cumulant structure 
as the unconditional distribution. Thus, to show that the conditional and unconditional distributions agree 
to second-order, it suffices to show that E{R(f)} = E{R(^)} + O p (n _1 ). 

Standard calculations, such as those given by Lawley (1956) and detailed in the Appendix of DiCiccio 
and Stern (1994b), show that W has the expansion 

W(i 0 = T rs l r l s - 2A rt T su Utlu - r rt T su Utlu + \ ru v sv T tw \ rst l u l v l w 

+ l T ru r sv r t WXrstlJvlw + 0p ( n -l) 

DiCiccio and Stern (1994b) showed that R(f) may be decomposed as R(ip) = r f^ 2 {Ri + R 2 + 
O p {n~ 3 ^ 2 )}, where R\ = —A lr l r and 

R 2 = \ lr \ S H rs l t + \\ lr T St Ut ~ \\ lr \ SU V tV \ rstUv ~ \\ lr T SU T tV \ r JJ v . 

Here R\ is of order O p {n~ 1 ^ 2 ) and R 2 is of order O p (n _1 ). Since E(Ri) = 0, it follows that 

E{R(f)} = v 1/2 {X lr X st X rStt + \X lr T st X rS)t + \X lr X st X rst + \X lr r st X rst } + 0(n~ l ). 

Note also that Ri = —A lr l r = —A lr l r and 


R2 


X lr X st Ut + \X lr T st Ut - \X lr X su v tv X rst l u l v - \X lr T su T tv X rst l u l v 
X lr X st Ut + X lr X st A J t + \X lr T st iJ t + \X lr T st kJ t 
- \X w X su u tv X r JJ v - \X lr T su T tv X r JJ v . 


Thus, since E(Rf) = 0, 


E{R(^)} = rj 1/2 { X lr X st X rs , t + \X lr r st X rs ,t + \X lr X su v tv X rst X uv + \X lr r su r tv X rst X uv + O p {n ~ 3/2 )} 
= 77 1 / 2 {A lr ’A st A T . S!t + \X lr T st X rs , t + \X lr X su v tv X rst X uv + \X lr T su r tv X rst X uv + O p (n~ 3 / 2 )} 
= r)R 2 {X lr X st X rs , t + \X lr r st X rs , t + | A lr A^A rst + \X lr r st X rst + O p (n~ 3 ' 2 )} 

= E{R(f>)} + O p {n~ l ). 


It follows that the conditional distribution of R(f) differs from its marginal distribution by error of order 
0(n _1 ), given A. □ 


McCullagh (1984) generalized the notion of the signed root statistic to the case of a vector interest 
parameter and established this stability result in the case of no nuisance parameters; Severini (1990) gave 
a further demonstration for the case of a scalar interest parameter with no nuisance parameters. There¬ 
fore, the result shown here extends the work of McCullagh and Severini to situations where nuisance 
parameters are present. 

This second-order stability of R(f) for the nuisance parameter context has been discussed, but not 
demonstrated formally as we have here, by Pierce and Bellio (2006). The methodological consequence 
of the result is immediate. Any approximation to the unconditional distribution of R(f>) having error of 
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order 0(n ~ l ) also approximates the conditional distribution of R(ip) to the same order of error. Such an 
approximation may (DiCiccio et al. (2001)) be derived, for instance, from the bootstrap distribution of 
R(ip). If that approximation is then used, say, to construct confidence limits for ip, then those limits have 
coverage error of order 0(rir l ), conditionally as well as unconditionally. 


4.2. Stability of other asymptotically standard normal pivots. We now consider general asymptoti¬ 
cally standard normal pivots of the form T(ip) = rp^ 2 {T\ + T 2 + O p (n _3 / 2 )}, where 7) = —A lr l r and T 2 
is of the form T 2 = p rst l rs h — £ rs Us> with Pf st and £ rs assumed to be of order 0(n~ 2 ), so that T) is of 
order O p (n~ l R) and T 2 is of order O p (n~ l ). We demonstrate that commonly used pivots may all be ex¬ 
pressed in this form; for example, for R(ip), the preceding expansions show that £ rst = \ lr X st + |A lr r st 
and £ r,s = ^X u X ur u vs X tuv + ^X u T ur r vs X tuv . Both conditionally and unconditionally, the fourth- and 
higher-order cumulants of such a pivot are immediately seen to be of order 0(n ~ l ) or smaller. Con¬ 
sequently, if we are to show that the conditional and unconditional distributions of these pivots agree to 
error of order 0(n _1 ) given A, all we need to show is that the first three conditional cumulants agree with 
the unconditional ones to error of order O p (n _1 ). We show that the first and third conditional cumulants 
agree with the unconditional ones to the required order of error without further restrictions on £ rs and 
£ rst . We demonstrate that for the second conditional cumulant to agree to with the unconditional one a 
sufficient condition is that £ rsl = |A lr A ls . It is easy to see that R(ip) satisfies this criterion for, in this 
case, 


f sl = A lr A sl + |(A%A ls A n ) = A lr A ls + i{A lr (-l/A n )A ls A n } = A lr A ls - |A lr A ls = PX ir X 


1_ \ lr \ Is 
2 ' 


Theorem 2. The unconditional and conditional distributions of T(ip) agree to error of order 0(n 1 ) 
given the ancillary statistic A. 


The result follows immediately from three lemmas concerning the stability of the first three cumulants 
of T(ip ), beginning with the first cumulant, the mean. 

Lemma 1. E{T(iP)j = E{T(iP)} + O p (n - 1 ). 

Proof Recall that 7\ = —A lr l r = —X lr l r and that T 2 = Pf st l rs h — £ rs Ms = £ rst (Ls + A rs )k ~ £ rs lJs- 
Then, E{T(iP)} = v 1/2 {C st X rs>t + C s Ks + 0(n~ 3 / 2 )} and 

E{T(ip)} = r, 1/2 {r t0 Ks,t + C s Ks + O p (n~ 3 / 2 )} 

= V 1/2 {e st Ks,t + C s Ks + 0 P (n- 3/2 )}. 

Therefore, the conditional first cumulant agrees with the unconditional one to error of order O p (nr 1 ), as 
required. □ 


Lemma 2. Ifp rsl = |A lr A ls , then var{T(^)} = var{T(-0)} + O p (n x ). 

Proof See Appendix. □ 


Lemma 3. skew {T(ip)} = skew{T(-0)} + O p (n x ). 
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Proof. See Appendix. □ 

A sufficient condition for var{T(^)} = var {T(f>)} + O p (n -1 ) is £ rsl = ■|A lT ’A 1,s ; if this holds, we 
have skew{T('0)} = p 3/,2 (A lr A ls A u A rs t — 6£ n ) + 0(n~ l ). 

5. Comparison of p -values 

Our objective here is to utilize preceding calculations to examine conditions which ensure that p-values 
based on two different asymptotically normal pivots agree to second-order. Here we refer to the p-value 
calculated from the exact sampling distribution of the pivot, or any approximation to the exact p-value 
accurate to O p (n _1 ). Such accuracy of approximation is obtained, for instance, quite generally for an 
asymptotically normal pivot by bootstrapping (Lee and Young (2005)), but would not be obtained by the 
normal approximation. 

Consider hypothesis testing for i/j based on a test statistic expressible as T(f>) = + T 2 ) + 

O p (tt, _ 1 ), where Tj = —A lr l r and T 2 is of the form T 2 = C rst l r sh ~ £ rs Ms> with £ rst and £ rs assumed to 
be of order 0{n~ 2 ). We have shown that the first three cumulants of T(pp) are 

A! = E{T{^)} = v 1/2 (e st Ks,t + C Ks) + 0(n _1 ), 
n 2 = var{T(-0)} = 1 + 0{n~ l ), 

k 3 = skew{T(^)} = r/ 3 / 2 (A lr A ls A lf A rs j + 3A lr A ls A u A rS)f - sl A u A rs , t - 6£ n ) + C^n" 1 ), 

while the fourth- and higher-order cumulants are of order Ofi" 1 ) or smaller. 

Consider another test statistic T(ip) = + T 2 ) + O p (n _1 ), where Ti = —A lr l r = Ti and T 2 is of 

the form T 2 = i rst l rs l t — C s l r h, with £ rst and £ rs assumed to be of order 0(n~ 2 ). Our goal is to establish 
conditions on the two pivots T(vj) and T(f)) which ensure that p -values agree to second-order. 

Theorem 3. If the conditions 

(1) C st = C st + 0(n~ 5/2 ), 

(2) r+ l tu \ u r rs = c + e u \ U T rs + o(n~ 5 / 2 ), 

are satisfied, then the p-value derived from the pivot T(L) agrees with that derived from the pivot f('ip) 
to error of order O p {n~ l ). 

Proof The p-value for testing against alternatives greater than is the right-hand tail probability for 
T(f>). The normalizing Comish-Fisher expansion shows that the p-value is 

1 - <f>(p 1/2 Ti + p 1/2 T 2 - §K 3 pT 2 - «i + Jk 3 ) + O p (n _1 ), 

where <&(•) denotes the standard normal cumulative distribution function. 

Let the first three cumulants of T(ip) be denoted by kq, k 2 , k 3 ; the p-value based on T(ip) is 

1 - <f>(p 1/2 T! + r/ 1/2 f 2 - |k 3 pT 2 - «! + J« 3 ) + O p (tt, _1 ). 
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We now determine sufficient conditions on £ rs and to ensure that the p -value obtained from T('ijj) 
agrees with that obtained from T('iji) to error of order () p (rr l ). Agreement of the values to this order 
occurs when 

p 1,2 T 2 - r)Ti -K 1 + |k 3 = T) 1/2 T 2 - jU 3 - k i + |« 3 

to error of order O p (n~ 1 ), that is when 

W /2 (T 2 - T 2 ) - i («3 - /-c 3 )r/T 2 } - {(/ci - Ki) - |(k 3 - k 3 )} = Op(n _1 ). 

The first term on the left-hand side of the preceding equation is random, as it involves terms of the 
form l rs l t and l r l t , while the second term is a constant. Consequently, by separating the random and 
non-random components, we see that the preceding equation actually stipulates two conditions: 

V 1/2 (T 2 - T 2 ) - i(« 3 - K 3 )r}Tf = Op(n _1 ), 

(«i - «i) - |(« 3 - « 3 ) = 0{n~ l ). 

The second of these equations gives (ki — Ki) = |(k 3 — k 3 ) + 0(n -1 ), so we can write the equations as 

(3) v 1/2 & 2 - T 2 ) - (K! - K1 ) V T? = O^rT 1 ), 

(4) («i - «i) - |(k 3 - k 3 ) = 0(n _1 ). 

Since r/T 2 = (—l/X u )X lr X ls l r l s = T rs l r l s , © yields 

( 5 ) - C%slt - - DUs - - S tUV )Xtu,v + - e U )\tu}T rS lrl s ] = Opin- 1 ). 

The quantity r] 1 / 2 {(£ rst — £ rst )lrsk — {i tuv — ^ tuv )\ tu,v} in © is reduced to order O p (n~ l ) if © holds. 

The remaining term r] 1 / 2 {(£ rs — £ rs ) + (£, tu — £, tu )\ t u T rs }l r ls in © is reduced to order O p {n~ l ) if © 
holds. We show that © is satisfied when © and © hold. Now © yields 

+ (r - rw,} + ^{(r 1 - f 1 -«“} = o («-■), 

and © yields £ rsl = £ rsl + 0(n~ 5//2 ). Under this condition, © reduces to 

v 1/2 {{? 8 - DKs} + v 3 / 2 (Z n - £ u ) = 0{n~ l ). 

Since r 11 = —A 11 = rf 1 , © gives £ n — £ n = rj~ 1 (^ rs — £ rs )A rs + 0(n~ 5 ^ 2 ), and hence, it follows that 
under © and ©, © is satisfied. □ 


Note that © and © together constitute necessary and sufficient conditions for the p-values to agree 
to order O p (n _1 ). The quantity on the left side of © is of the form r] ll/2 (A rst l rs l t — B rs l r l s ), where 


^rst _ ^rst _ ^ 


rst 


rst 


TDrs (ctuv rtuv\ \ rs . trs crs . (t-tu rtu\ \ r 

& = JA tu ,vT + £ + (£ )*tuT 


so a necessary condition for agreement in general of p-values to order O p {n ~ l ) is that A rst and B TS both 
be of order 0(n~ 5//2 ). The condition that A rst is of order 0(n ~ 5//2 ) is the same as © and, in light of 
this condition, that B rs be of order 0{n~ r ^ 2 ) is equivalent to ©. Thus, © and © are necessary for 
agreement of p-values to order O p {n~ l ). Of course, it is possible that the p-values from two test statistics 
T(ip) and T(w) fail to agree to order () p (rr l ) for arbitrary models, yet they do agree for some specific 
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model owing to particular features of the model. This situation could be revealed by verifying conditions 
© and © for the specific model. 


6. Examples 


To illustrate the results of the previous sections, we consider eight asymptotically standard normal 
pivots, in addition to the signed root likelihood ratio statistic R( 0). 

Consider four pivots that involve observed information. For R(ip), we have £^ st = A lr A st + \\ lr T st and 


e 


= iA“A 


ru v sv \ 


tuv 


+ \\ U T ru T sv 


X tuv , and hence, £ 


+ &A tu T rs = \\ u \ ru v sv \ tuv + \\ lt v uv \ tuv T r 


Example 1. Wald statistic with observed information. For the Wald statistic defined by Two (ip) = 
(tp — ip){ — Mn} 1 / 2 = (-0 — 0){ —L 11 } -1 / 2 , we have £^ 0 = £^ st and £^ 0 = A X u X ru o sv X tuv . Therefore, 
£{ 'f l Q = |A lr A ls and + C,wo^tu'r rs = Cr + P.r ^t u T rs . We deduce that, to error of second order, 
Twofy) is b°th stable in the sense discussed in Section 4 and produces the same p-values as R(ip). 
Example 2. Score statistic with observed information. For the score statistic defined by T S o(f>) = 

and = \X lt X ru u sv X tuv +\X u T ru r sv X t 
It follows that, to error of second order, 


Mi( 0 ){-M n } 1 / 2 = Ti{ 0 ( 0 )}{-L 11 } 1 / 2 , wehave££g = £^ 4 ■ 


^X tu T r 


Thus, Cs s o = i^ lr A ls and £& + ^t u r rs = ££ + £ 

Twoib) is also stable and again produces the same p-values as R(ip). 

The following two asymptotically standard normal pivots are not standard components of likelihood- 
based inference. They involve pivots constructed by evaluating the observed information at the con¬ 
strained maximum likelihood, rather than the global maximum likelihood estimator as in Examples 1 
and 2. Their use can be more cumbersome; they are included here to demonstrate the theoretical results. 

Example 3. Wald statistic with observed information evaluated at the constrained maximum likelihood 
estimator. For the pivot TwocW = (0 - 0)[-^n{0(0)}] 1/2 = (0 - 0)[-L n {0(0)}] 1/2 , we have 
Cwbc = Crf and ^oc = |A w A”*i^A tuv + \X u r ru r sv X tuv = £&. Hence, £$> c = AA lr A ls and 
iwoc + £woc^tuT rs = Cr + l f R^tuT rs . Thus T W oc{b) = Tso(b) + O p (n~ l ). To error of second order, 
TwocidP) is stable and produces the same p -values as R(ip). 

Example 4. Score statistic with observed information evaluated at the constrained maximum likelihood 
estimator. For TsocidP) = Mi('0)[-Mn{6'(-0)}] l > 2 = Li{6»(-0)}[-L 11 {6'('0)}] 1 / 2 , the corresponding 
score statistic, we have Csoc = ant l Isoc = ^ U ^ rUl/SV ^tuv = Hvo- Thus, ££oc = f A lr A ls and 
Csoc + i t soc^tuT rs = Cr + £,R^tuT rs . As in the previous example, T SO c(b ) = T wo (ip) + O p (n~ l ). To 
error of second order, T W oc(ip) is stable and produces the same p-values as R(ip). 

We consider pivots corresponding to Examples 1-4 above, but based on expected, rather than observed, 
information. 

Example 5. Wald statistic with expected information. For the version of the Wald statistic defined by 
T W e(xP) = $- -0){-A n }" 1/2 , we have Cwr = A rl A st and Cwe = \X lt X ru v sv X tuv + Aa^t^A^A^. 
Then, Cwe = A lr A ls and £{ f E + iwE^tuT rs = Cr + &A tu r rs + \X lt r ru X sv X tU}V + AA^A^t". 

Example 6. Wald statistic with expected information evaluated at the constrained maximum likelihood 
estimator. For the pivot described in Example 5, but with the expected information evaluated at the 
constrained maximum likelihood estimator, T WE c(ip) = (0 — 0)[—A 11 {6 l (0)}]^ 1//2 j we have Cvvec = 


£WE and £ 
trs £tr 

S WEC "h C.WEC 


rs 

WEC 

rs _j_ ctu \ _rs _ c 

I S WEC^tu 1 — S’ 


_ 1 \1 1 \ru-, ,sv \ I 

~ 2 A A V Atuv 
rs ctu 
WE “h Zwe 


AA u A r 
A tu T rs . 


A tuv 4 


1_ ylt^-ru^-sv 


tu,v 


Then, £ 


rs 1 
WEC 


= A lr A ls and 
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Neither Twe( VO nor Twec( VO generally satisfy the above sufficient condition for stability to error of 
order (){rr ] ) and, of course, they do not generally provide p-values that agree with those from R(ip) to 
error of order O p (t 7 , _1 ). However, the 71 -values calculated from T WE (ip) agree with those from T WE c(ip) 
to error of order O r ~ 1 ' 


,n 


Example 7. Score statistic with expected information. For the version of the score statistic defined by 

i'i t\™v sv \ tuv + 

_tu \ st- rs crs \ ctu 1 

SE 


TseW = A 11 } 1 / 2 = L 1 {0('0)}{—A 11 } 1 / 2 , wehave^g = A rl z/ St and££^ = |A W A 


1 \ ItsT-rUsr-sv \ 1 \ 1 tsr-ru \ sv \ 

2 A t t A tuv ~ 2 A ' A A tu,v 

1 \ 1 tsi-ru \sv \ 1 \ 1 t^uv \ si-rs 

-AT A Atu.v ~ -A T At uv T 


Therefore, Sf s s E = 0 and f 


rs 

SE 


+ e s u E ^tuT rs = c R s + e^tuT rs - 


order O p fn " 1 

of order O p (t7 , _1 

-li 


2 '' ' ''tu,v 2 ‘ 

Example 8. Score statistic with expected information evaluated at the constrained maximum likelihood 
estimator. Evaluating the expected information instead at the constrained maximum likelihood estima¬ 
tor, for T SE c{f>) = iETi (V 7 ) [—A 11 {6*('0)}] 1//2 = L 1 {0(^)}[—A 11 -^^)}] 1 / 2 , we have ^ r s s f = \ rl u st and 
Csec = \\ u \ ru v sv \tuv - \\ n T ru P sv \tu, v . Thus, Csec = 0 and ? S ’ EC + efEcAtuT rS = Cs S E + e^tuT rs . 

Neither T SE (f>) nor T SEC (f>) generally satisfy the above sufficient condition for stability to error of 
order 0(n -1 ), and they do not generally provide p -values that agree with those from R(v') to error of 
rf 1 ). However, the p -values calculated from T S e('<p) agree with those from T SE c(f>) to error 
r nr 1 ), although they do not generally agree with those from T WE {f>) and T W ec (f/0 to error 
of order O p (n 

Construction of the asymptotically normal pivot for inference on the interest parameter in the pres¬ 
ence of a nuisance parameter using observed information is therefore key to ensuring that /v-values cal¬ 
culated from the marginal distribution of the pivot, as might be approximated in generality by parametric 
bootstrapping, automatically respect, to second-order, the conditioning on ancillary statistics required 
for inferential correctness. The importance of using observed information instead of expected infor¬ 
mation for approximate conditional inference is, of course, well known, having been argued by Efron 
and Hinkley (1978), who were partly inspired by the discussion given by Pierce (1975) to the paper by 
Efron (1975) on the geometry of exponential families. Our analysis gives a very direct operational inter¬ 
pretation, in terms of the />-values derived from the marginal sampling distributions of commonly used 
pivots. 

Further discrimination between pivots may be based on the requirement of parameterisation invariance, 
that inferential conclusions should not depend on the parameterisation: see, for instance, Pace and Salvan 
(1997, Section 2.11). Requirement of invariance of the inference under reparameterisations which are 
(Bamdorff-Nielsen and Cox (1994, Section 1.5)) interest-respecting would exclude use of Wald statistics: 
see, for instance, McCullagh (1987, Section 7.4). 


7. Extension to adjusted profile likelihood 

The general form of the asymptotically normal test statistic that we have considered, where the statis¬ 
tic is expressible as = r/ l l 2 {Ti + T) + O p {n~ l ), where 7\ = —A lr l r and T 2 is of the form 

T 2 = £ rst l rs lt ~ with £ rst and £ rs assumed to be of order 0 (tt, _ 2 ), covers important special cases 

which are commonly applied. It does not, however, include asymptotically standard normal pivots based 
on adjusted forms of profile likelihood. Fortunately, only a simple change to the analysis is necessary 
is accommodate pivots based on adjusted likelihoods. The criteria for second-order stability and equiva¬ 
lence of p-values are unchanged since, to the order being considered, the version of the pivot based on the 
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adjusted profile likelihood is obtained by a constant, additive adjustment of that based on the unadjusted 
profile likelihood. 

There have been many suggestions to replace the usual profile likelihood function M(ip) by an adjusted 
version M(ip) = M(ip) + B(ip), where B(ip) is an adjustment function which is a function of Y and 
ip only, whose derivatives with respect to ip are of order O p (1). The likelihood ratio statistic based on 
the adjusted profile likelihood is W(ip) = 2{M(ip) — M(ip)}, where ip is the point at which M(ip) is 
maximized. The signed root of the likelihood ratio statistic based on the adjusted profile likelihood is 
R(ip) = sga(ip — ip){W (ip)} 1 ! 2 . 

Following our previous notation, we write Bi(ip) = dB(ip) / dip, B n (ip) = d 2 B(ip)/dip 2 , etc. Let 
Pi = E{Bi(ip)}, fin = E(Bu), etc.; these quantities are assumed to be of order 0(1). Further, let 
bi = Bi(ip) — Pi, bu = Bn(ip) — Pu, etc., with these quantities assumed to be of order O p (n~^ 2 ). 
Assume also that the joint cumulants of nbi, nbn, l r , l rs , etc. are of order 0(n). 

In many instances, a specific adjustment function B(ip) has been proposed to take into account the 
effect of nuisance parameters for inference about ip, notably the modified profile likelihood of Bamdorff- 
Nielsen (1983) and the adjusted profile likelihood of Cox and Reid (1987). Other adjustments with the 
same structure as described above are detailed by Skovgaard (1996), Severini (1998), DiCiccio and Mar¬ 
tin (1993), and Barndorff-Nielsen and Chamberlin (1994). These adjustment functions have the effect 
of reducing the mean of the profile score from order 0(1) to order 0(n -1 ): see, for instance, DiCiccio 
et al. (1996). The adjustment functions have P\ — p + 0( n -1 ), where p = —^A lr u st (|A rst + X rs ,t)- 
Since, in general, E{Mi(ip)} = —p + 0(n~ l ), it follows that E{Mi(ip)} = 0(n _1 ): see McCullagh 
and Tibshirani (1990), DiCiccio et al. (1996). 

Another version of the adjustment function that derives from Bayesian inference based on a prior 
density n(9) is 


B(ip) = “2 l0g 


det[-L ab {g(^)}] ^ 


log 


7T{^)} 

n(9) 


det {-L ab (9)} 

where a, b — 2,..., d. Here {L ab (9)} is the (d — l)x(d — 1) submatrix of {L rs (9)} corresponding to the 
nuisance parameters. This adjustment function arises from the Laplace approximation to n^\ Y (ip), the 
posterior marginal density function for ip, developed by Tierney and Kadane (1986), who showed that 
n^\ Y (ip) = cM(ip){ 1 + 0(?tU 3 / 2 )}, for values of ip such that ip — ip is of order 0(n -1 / 2 ). In this case, 
W(ip) corresponds to the posterior ratio statistic to error of order O p (n~ 3 / 2 ), and d\ = //A lr (l// ,s, A r , st — 
7r r /7r): see DiCiccio and Stem (1994a). Firth (1993) developed particular adjustment functions motivated 
by the specific aim that ip be unbiased to error of order 0(n~ 3 / 2 ). 

For a general adjustment function B(ip), DiCiccio and Stern (1994a) showed that R(ip) = r] l ^ 2 {Ri + 
R -2 + O p (n~ 3 / 2 )}, where Ri — Ri — —A lr l r and R 2 = R 2 — A n /3i; in particular, R(ip) = R(ip) + 
r]~ 1,2 Pi + Op(n _1 ). 


Pierce and Bellio (2006), considering the adjustment functions related to modified profile likelihood 
and Bayesian inference, also observed that, to error of order () p (rr l ), R(ip) differs from R(ip) by only 
a constant, although they did not detail the associated formulae involving Pi. Having made this observa¬ 
tion, Pierce and Bellio (2006) conclude that, to error of order O p (n~ l ), both R(ip) and R(ip) induce the 
same orderings of datasets for evidence against the null hypothesis, and they conclude that, to this order 
of error, ideal frequentist p-values can be based on the distribution of R(ip). 
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We generalize our preceding results by considering hypothesis testing for ip based on a test statistic 
T(ip) = 7 // 2 ( Ti + T 2 ) + Opin' 1 ) where, as before, T) = 7) — —A lr l r , and f 2 is assumed to be of the 
form T 2 = i rst l r sU ~ t rs 0h + S' = T 2 + q, with £f st and £ rs of order 0(n~ 2 ) and the constant q assumed 
to be of order 0(n _1 ). Therefore, T(ip) = T(ip) + rfOq + 0(nr 1 ). We provide illustrations which 
demonstrate how statistics constructed from adjusted profile likelihood may be expressed in this form. 

Since T(ip) only differs, to the second-order being considered, from T(ip) by a constant, the condition 
for T(ip) to be stable to error of order 0(n ~ x ) is the same as the condition for T(ip), namely £ rsl = 
|A lr A ls . 

The first three cumulants of T(ip) = T(ip) + rfOq + 0(n' x ) are + rfiOq + 0(n _1 ), k 2 = 

k 2 + 0(n _1 ), R 3 = k 3 + 0(n _1 ), where K\, k 2 , and k 3 are as described before for T(ip), and the fourth- 
and higher-order cumulants of T(ip) are of order ()(rr 1 ), or smaller. 

Consider two versions of T(ip), say T(ip) -\-r] 1 ^ 2 q + 0(n ~ 1 ) and T(ip) +r/ 1 ^ 2 q+0(n~ 1 ). The preceding 
Comish-Fisher argument for comparing p- values shows that the p- values from the two test statistics differ 
by order () p (rr l ) provided 

{rj 1 / 2 (T 2 + q - T 2 - q) - |(k 3 - ^)v T i} ~ {(«i + V /2 ? - «i - V 1 / 2 s) ~ g(« 3 - k 3 )} = Opin' 1 ). 

The crucial point is that the terms involving q and q cancel from the left side of this expression, irrespec¬ 
tive of their values, so © and © continue to specify necessary and sufficient conditions for the two test 
statistics to yield /j- values that differ by order Opin' 1 ). 

Example 9. Signed root likelihood ratio statistic constructed from adjusted profile likelihood. For 
the signed root likelihood ratio statistic constructed from the adjusted profile likelihood, R(ip), standard 
calculations show that ff 1 = qp = ri~ ] if. It follows that, to error of order Opin' 1 ), 

R(ip) and R(ip) produce the same p-values, as noted by Pierce and Bellio (2006). 

Example 10. Wald statistic with observed information constructed from adjusted profile likelihood. 
For the pivot T AWO (ip) = (ip - ^){-M u (^)} 1/2 , we have Cfwo = two = t awo = two » and 

Sawo = r r l Then, since to error of order Opin' 1 ), T wo (ip) and R(ip) produce the same p-values, it 
follows that TawoW and R(f>) produce the same jv-values to that order of error. 

Example 11. Score statistic with observed information constructed from adjusted profile likelihood. 
For the statistic T A so(ip) = Ml(^){-Mh(^)} 1/2 , we have Caso = tso = ttf, Caso = tso* and 
Saso = Since, to error of order Opin' 1 ), T S o(f>) and R(ip) produce the same p-values, it follows 

that Taso (ip) and R(ip) produce the same 71 -values to that order of error. 

The interesting feature here is that although R(ip), T A wo(fp), and Taso(' 1 p) differ from one another by 
non-constant terms of order Ojfin'^ 2 ) in general, they all produce the same y>values to error of order 
Opin' 1 ). 


8. Vector-valued interest parameter 

Consider again the partition 6 = (ip, (p), but now allow for the possibility that the interest parameter ip 
is vector-valued, having dimension q. The likelihood ratio statistic W (ip) is routinely used for hypothesis 
testing about ip. The asymptotic distribution of W (ip) is chi-squared with q degrees of freedom. Indeed, 
for regular problems, the Xq- approximation to the distribution of W (ip) has error of order (){rC 1 ), and 
moreover, the mean of W (ip) has the expansion E{W ( 7 /))} = q( 1 + n ' 1 u) + 0(n' 2 ), where uj = c 0 ( 6 ) 
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is of order 0(1). Lawley (1956), Bamdorff-Nielsen and Cox (1984), and Bickel and Ghosh (1990) 
showed that W (0) is distributed as (1 +n~ 1 co)xl to error of order 0(n~ 2 ): the Bartlett-corrected statistic 
W (0) /(I + n _1 o;) is distributed as Xq to error of order 0(n~ 2 ). Further, W (0) is stable. 

Theorem 4. The unconditional and conditional distributions ofW( 0) agree to error of order 0(n~ 3/2 ), 
given the ancillary statistic A. 

Proof. By applying identical arguments to the conditional distribution of Y given A, we have that 
E{W (0)} = q( 1 + 77, _1 cu) + 0(n~ 2 ), where 0 is of order 0(1) given A, and that W (0) is conditionally 
distributed as (1 + 77, -1 a;)Xg to error of order 0(n” 2 ) given A. Bamdorff-Nielsen and Cox (1984) showed 
that u = ui + O p (nr 1 ri ), and hence it follows that W (0) is stable to error of order 0(n~ 3,/2 ). Extending 
the arguments of McCullagh (1987, Section 8.4) to the nuisance parameter case, u = co + O p (n~ 1 / 2 ) 
continues to hold provided the conditioning statistic A is a second-order local ancillary statistic. □ 

Inference based on an approximation to the marginal distribution of W (0) accurate to error of order 
0 (n 3/2 ) therefore automatically respects conditioning on the ancillary statistic to that same order. 

Bickel and Ghosh (1990) explicitly recommended that the Bartlett adjustment factor (1 + n~ l u>) be 
estimated by simulation; this may be done by either fixing 6 = 6 or 9 = 6 , so that inference is based on 
a Xq approximation to the sampling distribution of, say, W (0) /{I + n~ 1 uj(6)}. Alternatively, the entire 
distribution of W (0) may be approximated by simulation at either of these parameter values: such an ap¬ 
proximation is, however, likely to be computationally more expensive than estimation of just the Bartlett 
adjustment factor. In view of the stability result above, these inference procedures not only provide p- 
values that are uniformly distributed to error of order O p (n~ 3 ^ 2 ) (actually, the error is of order O p (n~ 2 ) - 
see Bamdorff-Nielsen and Hall (1988)), but these p-values are uniformly distributed conditionally to the 
same order of error. 

DiCiccio and Stem (1994b) demonstrated the efficacy of Bartlett correction for likelihood ratio statis¬ 
tics based on adjusted profile likelihoods. They showed that E{W(f>)} = q( 1 + n _1 a)) + 0 (tt, - 2 ) and 
that W (0) is distributed as (1 + rr l 'jj)x'l t° error of order 0(rir 2 ). Moreover, their calculations can be 
applied to the conditional distribution of Y given A to show that these results also hold conditionally, as 
for kF(0). 

Theorem 5. The unconditional and conditional distributions ofW(ip) agree to order 0(rX 3,/2 ), given 
the ancillary statistic A. 

Proof See Appendix. □ 

The operational consequences of this stability result are again straightforward. Similar stability results 
hold for other test statistics that are asymptotically distributed as y 2 , such as (0“ — 0 a )(0 fc — Jp b )S a b and 
M a (0)M 6 (0)S' ab , where S ab = — M a6 (0) and ( S ab ) is the q x q matrix inverse of (Sab). The marginal 
distribution function of such a statistic X typically has the expansion 

k 

Pr(X <x) = Pr(x\ < x) + ^ ajPr(x 2 q+2 j < x) + 0(n~ 3/2 ), 

3=0 


STABILITY AND UNIQUENESS OF p -VALUES FOR LIKELIHOOD-BASED INFERENCE 


15 


where the cl, are functions of the A’s and /3’s and typically k = 3; see, for example, Harris (1985) and 
Cordeiro and Ferrari (1991). The same manipulations of likelihood quantities that produce the approxi¬ 
mation to the marginal distribution of X can be applied to conditional likelihood quantities to yield the 
expansion 

k 

Pr(X < x | A) = Pr(x 2 q < x) + ^ j a j Pr{x 2 q+2 j < x ) + O p (n~ 3/2 ), 

3=0 

where the ay are functions of the A’s and /3’s. The preceding calculations that demonstrate the stability 
of W(ip) can also be used to show that ay = a 3 + O p (n~ 3 / 2 ), and it follows that X is stable to error of 
order 0 (n ~ 3//2 ). 


9. Discussion 

Focus here has been on inference on an interest parameter in the presence of a nuisance parame¬ 
ter in ancillary statistic models. We have shown that commonly used, asymptotically standard normal, 
likelihood-based pivots, including the signed root statistic Rfy), are second-order stable. When applied 
with such a pivot, procedures such as the parametric bootstrap, which approximate the marginal distri¬ 
bution of the pivot to second-order, achieve the same order of accuracy, 0 (n ~ 1 ), in approximation of 
the relevant exact conditional inference. Our motivation for the analysis here is as a preliminary to full 
evaluation of the properties of such parametric bootstrap procedures as an alternative to more awkward 
analytic approaches to approximation of exact conditional inference. In this regard, of importance for 
future investigation is analysis of large deviation properties of procedures based on marginal simula¬ 
tion of a likelihood-based pivot. Analytic procedures, such as normal approximation to R*(^), or the 
approximation of Skovgaard (1996), confer large deviation protection, typically providing accurate ap¬ 
proximation of the conditional distribution of the associated pivot far into its tails. The requirement of 
such large deviation behaviour may be judged an important discriminant between competing methodolo¬ 
gies. Discussion of this and related issues is currently in preparation in DiCiccio, Kuffner and Young 
(2014). 

Pivots stable to third-order do, of course, exist: R is distributed as standard normal to third-order, 
conditionally on the ancillary statistic, and hence unconditionally as well. Second-order approximation 
to an exact conditional inference through the bootstrap is seen (see, for example, DiCiccio and Young 
(2010), Young and Smith (2005, Chapter 10)) to give good results in practice in ancillary statistic settings. 
Basing inference on a pivot stable to third-order seems unwarranted. In addition, ancillary statistics are 
typically not unique and (see, for instance, McCullagh (1992)), different conditional inferences typically 
only agree to second-order, so it can be argued that third-order approximation to an exact conditional 
inference is, in itself, unwarranted. By our analysis, inference based on second-order (or higher-order) 
approximation of the marginal distribution of a pivot stable to second-order approximates any conditional 
inference to (){n 1 ). 

Our study of uniqueness of p-values yielded simple conditions under which p-values derived from 
different asymptotically standard normal pivots agree to order O p {n~ l ). In cases we have considered 
where the conditions fail to be satisfied, a more detailed analysis shows that jj- values agree only to an 
actual order O p (n ~ 1 / 2 ). 
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Appendix 

Proof of Lemma 2. The unconditional variance of T(-0) is 

var{T(V0} = E{{Tm 2 } - [£{T(V>)}] 2 = £[{T(V>)} 2 ] + O (n" 1 ) 

= V E{Tf + 2 T x T 2 + O p (n~ 2 )} + O^’ 1 ) 

= r/E{\ lr \ ls l r l s - 2A lr i stu l r l st l u + 2X lr C t U s l t + O p {n~ 2 )} + 0(n~ l ) 

= — r]{X lr X ls X rs + 0(n~ 2 )} + 0(n -1 ) 

= 1 + 0(n~ l ). 

Correspondingly, the conditional variance of T(-0) is 

var{T(V0} = E[{Tm 2 ) ~ [ A{W)}] 2 = A[{T(V>)} 2 ] + O^n" 1 ) 

= V E{T 2 + 2 T{T 2 + O p (n~ 2 )} + Opin' 1 ) 

= V E{X lr X ls lJ s - 2X w e tu lr(lst + A st )i u + 2X lr e t UJt + O p {n~ 2 )} + Opin' 1 ) 
= -? 7 {A lr A ls A rs - 2 X lr £ stu °X ru A st + O p (n" 2 )} + Opin' 1 ) 

= —ti{A lr A ls (A rs + A rs ) - 2 X lr £ stu X ru A st } + Opin' 1 ) 

= 1 - V ^ lr ^ ls Ks - 2C a A st ) + Opin' 1 ) 

= 1 - V {(X lr X ls - 2f- 1 )A„} + O p (n~ l ). 


It follows that var{T 1 ( , 0)} = var+ O p (n x ) provided £ rsl = |A lr A ls . 


□ 


Proof of Lemma 3. The unconditional skewness of T(t/>) is 

skew{T(V0} = E{{T{^) - E{Tmf) = E[{T^)Y] - 3f?[{T(^)} 2 ]f?{T(^)} + 0{n~ l ) 
= rft 2 \E{(Ti + T 2 ) 3 } - 3 E{(T X + T,) 2 }^ + T 2 )] + O^' 1 ) 

= r/ 3/2 [ff{T| 3 + 3T 2 T 2 + Op(n~ bl2 )} - 3E{Tf + Op{n~ 3/2 )}E(T 2 )] + Ofa" 1 ) 

= 77 3 / 2 [E{-A lr A ls A u / r y t + 3X lr X ls (£ tuv l t Jv ~ e u hL)lrls + O p (n- 5 / 2 )} 

- 3£ , {A lr A ls Z r Z s + O p (n- 3/2 )}{r%M + f s A rs + O p (n ~ 3/2 )}] + O^” 1 ) 
= rf/ 2 {E(-X lr X ls X u l r l s l t + 3X lr X l8 C tuv l r lsk u lv - 3X lr X ls ? u l r l s l t l u 

- 3X lr X ls e uv U s X tu , v - 3X lr X ls e u l r lsXtu)} + 0{n~ l ). 


To continue the calculation, we make use of the following identities: 

—E(l r l s l t ) = X 

rs,t “1“ ^ rt,s ^st,r ""t - ^rsti 

E(l r l s l tu l v ) ^rs^tu,v ^rv^tu,s ^sv^tu,r 0(Tl ^ ), 

^rs^tu ^rt^su + A ru X s t + 0(n 3 ^). 
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By using these identities, we obtain 

skew{T( , 0)} = 77 3 / 2 (3A lr A ls A u A rS)t + X lr X ls X u X rst 

Q\ll £tuv \ Q\lsctul\ Q\lrctlll\ 

— OA Z; A tu ,v — JA Z; A tu , s ~ oA Z; A tu , r 

- 3x u e u x tu - 3^ n - 3e ri 

+ 3X n £, tuv X tu>v + 3X 11 ^ tu X tu ) + 0{n 3 ) 

= ? 7 3/2 (A lr A ls A lt A r . st + 3A lr A ls A u A rs , t - 6£ rsl A u A rs , t - 6£ n ) + 0(n~ l ). 

Similar reasoning shows that the conditional skewness of T(ijj) is 

skew{T(^)} = r] 3/2 [E{-X lr X ls X u l r l s l t + 3X lr X ls £ tuv l r l s l tu l v - 3X lr X ls £ tu l r l s l t l u 

- 3A lr A ls e uv lrlsX tu , v - 3X lr X ls e u UsX tu + O p (n- 5 / 2 )}] + O^n" 1 ) 

= r) 3 ' 2 [E{-X lr X ls X u i r Ut + 3X lr X ls e uv Us(L + A tu )i v - 3X lr X ls e u UJtk 

- 3X lr X ls e uv ttsXtu,v - 3X lr X ls e u UsXtu}} + O^n- 1 ) 

= r) 3/2 {E(-x lr x ls x u i r iJ t + 3X lr x ls z tuv i r iJJ v - 3x lr x ls z tu Usi t L 

- 3X lr X ls e uv UsXtu,v - 3X lr X ls e u UsX tu )} + O p {n~ l ). 

Now we use the following identities: 

— E(l r l s l t ) = A 

rs,t ^rt,s ^ st,r ^rst 

^ rs,t ^rt,s ^st,r “t - ^rst ^ ) 

— A r ,s,t + Op^n 1 / 2 ) 

= -E(l r l s l t ) + O p (n V 2 ), 

EiJj-plgliyly^ ^rs^tu,v ^rv^tu,s ^sv^tu,r H - Op(fl ^ ) 

^rs^tu,v ^rv^tu,s ^sv^tu,r O p {jl ^ ) 

— E(l r l s l tu lv) + ^p(^ 3//2 )? 


E{l r l s l t l u ) ^rs^tu “I” ^rt ^ su + A ru X st + O p (n 3 ^ 2 ) 

Ks^tu “I” ^ rt ^ su + A ru X st + O p (n 3 / 2 ) 

= E{l r l s l t l u ) + O p (n 3/2 ). 


By using these identities in the preceding expression for skew{T(^)}, it is apparent that skew{T(^)} = 
skew {T (?/>)} + Op (nr 1 ), and hence, the conditional third cumulant agrees with the unconditional one to 
error of order O v (n ~ l ), as required. □ 
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Proof of Theorem 5. To establish the stability of W (0) to error of order O(n 3/2 ), we need only show 
that E{W (0)} = E{W (0)} + ()p(rr 3/ " 2 ). For full generality, the previous notation, which is applicable 
when 0 is a scalar, must be extended. In the expressions that follow, it is assumed that subscripts and 
superscripts a,b,... have the range 1,..., q, while r,s,... range over 1,..., d. Let ( r ] ab ) be the q x q 
matrix inverse of (— X ab ), let r rs = rj ab X ar X bs , and let u rs = X rs + r rs . In addition, let B a (vi) = 
d5(0)/d0“,5 a6 (0) = d 2 B(^)/drd^ b ,P a = E{B a (ilj)},p ab = E{B ab (^)},b a = 5 a (0) - BaXb = 
Bab(ip) — Bab, and so forth. The constants Ba, Bab etc. are assumed to be of order 0(1) and the variables 
b a ,b ab etc. are assumed to be of order O p (n _1 ' /2 ). Finally, it is assumed that the joint cumulants of 
nb a , nb abl l r . l rs , and so forth are of order 0(n). 

DiCiccio & Stem (1994b) showed that 

W{i>) = IU(0) - 2X ar /3 a l r - 2X ar bJ r + 2X ar X st BaUt - X ar X su X tv p a X rst lJ v 
+ X ar X bs PM - X ab (3 a (3 b + O p (n~ 3 / 2 ), 


and it follows that 

E{W{^)} = E{W( 0)} - 2A ar E(b a l r ) + X ar X st (3 a (2X rSjt + X rst ) - X ab ((3 ab + f3 a (3 b ) + 0(n" 3 / 2 ) 

= E{Wm + X ar X st f3 a (2X rs , t + X rst ) - 2X ar f3 a/r + X ab (p ab - /3 a p b ) + 0(n" 3 / 2 ), 

where (3 a / r = d(3 a /d6 r . For calculating E{W('ip)}, we assume that 5(0) is a function of Y and 0 
only, so, in particular, it does not depend on 0. Thus, differentiation of the identity /3 a = E{B a (iX)} 
yields /3 a/b = E(b a l b ) + /3 ab and /3 a/i = E{b a li) for i = q + 1,..., d. It follows that A ar E(b a l r ) = 

x ar p a/r - x ab p ab . 

To calculate E{W (0)}, some care is required about the conditional properties of 5 a (0), B ah (tp). and 
so forth. The quantities (3 a = 5{5 a (0)}, 0 ab = 5{5 ab (0)}, etc. are assumed to be of order O p ( 1), 
while b a = 5 a (0) — /3 a , b ab = 5 ab (0) — (3 ab , etc. are assumed to be of order O p (t 7, -1 / 2 ). Finally, it is 
assumed that the joint conditional cumulants of nb a , nb ab , l r , l rs , and so forth are of order O p (n). 

Under the preceding assumptions, it is possible to determine the orders of the differences d a — B a 
and Bab ~ Bab■ Since E0 a ) = ^[^{^a(0)}] = 5{5 a (0)} = Ba and var 0a) = var[5{5 a (0)}] = 
var{5 a (0)} — 5[var{5 a (0)}] = 0(n _1 ) — 5{var(fe a )} = 0(n -1 ) — 5{0 p (n _1 )} = 0(n _1 ), it follows 
that Ba = Ba + O p (n ~ 1 / 2 ). A similar argument shows that Bab = Bab + O p (n -1 / 2 ). We assume that 
differentiation of the identity Ba — Ba + O p (n ~yields B a / r = Ba/r + 0 ;) (n~ 1//2 ). 

Now, define 5 a = /3 a — Ba, so that d a is a function of 6 and A of order O p {n~ l B). Furthermore, 
b a = 5 a (0) - Ba = #a(0) - Ba + L = b a + L- To calculate E{W (0)}, we observe that 

1U(0) = VF(0) - 2A aT BJr ~ 2X ar b a l r + 2X ar X St BaUt ~ X ar X SU X tV BaKstUv 

+ x ar x bs BabUs - x ab BaBb + o p (n~ 3/2 ) 

= IU(0) - 2A ar Bal - 2A ar (b a + S a )l r + 2X ar X St Ba(L + Ks)h ~ A“ r X™X tV BaXrstUv 

+ x ar x bs BaJJ s - x ab BaBb + o P (n" 3/2 ), 
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and thus 

E{W^)} = E{W{i 0)} - 2A ar bJ r + 2X ar X st (3 a Xrs,t + X ar X su X tv /3 a X rst °X uv 
- \ ar \ bs p ab \ rs - X ab f5 a f5 b + O p (n~ 3 / 2 ). 

Bamdorff-Nielsen & Cox (1984) showed that E{W{f>)} = E{W('ip)} + O p (n~ 3 / 2 ) ; recall that X rs = 

X rs + Op(n 1//2 ) and °X rs j = X rSjt + O p (n 1//2 ). Then, X ru X st X ut = X rs + O p (n~ 3 / 2 ), and 

E{WW} = E{wm + X ar X st f3 a (2X rs , t + X rst ) - 2X ar E(bJ r ) - X ab {(5 ab + f5 a (5 b ) + O p (n- 3 / 2 ). 

Now, using the result that X ar E(bJ r ) = X ar ft a / r — X ab j3 ab = X ar /3 a / r — X ab /3 ab + (^^(n -3 / 2 ), which holds 
since /3 a/r = /3 a/r + O p (n _1/2 ) and /3 afe = /3 ab + O p {n~ l/2 ), we have 

E{1C(^)} = E{W^)} + X ar X st p a (2X rStt + X rst ) - 2X ar (3 a/r + X ab ((3 ab - (3 a (3 b ) + O p (n ~ 3/2 ) 

= E{Wm + O p (n~ 3 / 2 ), 

as required. □ 
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